reading .csv files needed to run the report without code to shape variables

I have separated out the code which shapes the variables in the player data and creates the data files needed to run this report. Before running this file, run the file reportcode.Rmd to create all of the data files necessary to knit this report.

#Reading in data files
stry_scale <- read.csv(file = "data/stry_scale.csv")
longral_pos <- read.csv(file = "data/longral_pos.csv")
fed16_longest <- read.csv(file = "data/fed16_longest.csv")
fed16_only <- read.csv(file =  "data/fed16_only.csv")
fed16_scale <- read.csv(file =  "data/fed16_scale.csv")
fed16_df <- read.csv(file =  "data/fed16_df.csv")
fed17_only <- read.csv(file = "data/fed17_only.csv")
fed18_only <- read.csv(file = "data/fed18_only.csv")
fed_scale <- read.csv(file =  "data/fed_scale.csv")
stry16_only <- read.csv(file = "data/stry16_only.csv")
stry17_only<- read.csv(file = "data/stry17_only.csv")
stry_scale <- read.csv(file = "data/stry_scale.csv")
fed16_only_plots <- read.csv(file = "data/fed16_only_plots.csv")
dlb_fedonly <- read.csv(file = "data/dlb_fedonly.csv")
iof_fedonly <- read.csv(file = "data/iof_fedonly.csv")
dlw_fedonly <- read.csv(file = "data/dlw_fedonly.csv")
longral_df <- read.csv(file = "data/longral_df.csv")
fed16_pos <- read.csv(file = "data/fed16_pos.csv")

Analysing Rallies From Matches at The Australian Open using the Hidden Markov Model

This project uses Hidden Markov Models to identify player shot intentions from matches at the Australian Open.

The match data analysed contains information for all rallies 3 shots or longer. Serves and returns of serves have been excluded from the data. The analysis looks at two and three hidden state models to attempt to identify player intentions.

The two hidden state model extends Jeremy Forbes’ research project. The player shot intentions are: attacking or ‘winner’, where the players aim is to win the point on that shot. And ‘return’ where the player’s aim is just to get the ball back into play.

The three hidden states model extends this model to include ‘errors’, where the player unsuccessfully tries to win the point or return the ball into play with that shot.

This analysis is done using the dependent mixture model contained in the depmixS4 package.

Glossary of added variables

Winter Research Project

lastshot - last shot in the point

fedhit - federer hits the shot

isserver - if server hits the shot

winner - if shot is winner (i.e. point doesn’t end in error, as opponent doesn’t reach the ball)

speed.diff - difference in current shot speed vs opponents last shot speed (at impact)

speed.ratio - shot speed/opponent’s previous shot speed

speed1 - speed of shot from impact with racquet

o.angle - angle made by opponents previous shot

p.angle - angle made by player’s current shot

lag.p.angle - angle made by player’s previous shot in rally

lag.speed.ratio - speed ratio of player’s previous shot in rally

oppo.hit - coordinates of opponents previous impact

Summer Research Project

winner.return.error - Classifies whether the current shot ended in winner, was returned into play or ended in error

rally.number - Numerical count of rally number within match

avg.player.speed.diff - difference in player and opponent average speed of movement from start of current shot to end of next in m/s

peak.player.speed.diff - difference in player and opponent peak speed of movement from start of current shot to end of next in m/s

avg.player.speed.ratio - ratio in player and opponent average speed of movement from start of current shot to end of next in m/s

peak.player.speed.ratio - ratio in player and opponent peak speed of movement from start of current shot to end of next in m/s

avg.player.acceleration.diff - difference in player and opponent average speed of movement from start of current shot to end of next in m/s

peak.player.acceleration.diff - difference of player and opponent avg acceleration during shot in m/s2.

avg.player.acceleration.ratio - ratio of player and opponent avg acceleration during shot in m/s2.

peak.player.acceleration.ratio - ratio of player and opponent peak acceleration during shot in m/s2.

p.avg.speed.match - players average speed of movement for the match in m/s

oppo.avg.speed.match - opponents average speed of movement for the match in m/s

p.diff.avg.shot.and.match.movement.speed - difference between players average movement speed for the current shot (from the start of the current shot to the end of the next) compared to their match average in m/s

oppo.diff.avg.shot.and.match.movement.speed - difference between opponents average movement speed for the current shot (from the start of the current shot to the end of the next) compared to their match average in m/s

p.rally.side.change.count - the number of times the player has run from the forehand(backhand) side to the backhand(forehand) side within the rally

oppo.rally.side.change.count - the number of times the opponent has run from the forehand(backhand) side to the backhand(forehand) side within the rally

rally.side.change.count.diff - the difference in the number of times the player and opponent have changed sides within the rally

rally.distance.run.ratio - the ratio player/opponent total distance run within the rally

p.movement.angle.1 - The angle the player runs from the start to the end of their current shot (o.deg directly towards the net 180.deg directly away from the net)

p.movement.angle.2 The angle the player runs from the end of their current shot to the start of their next shot (o.deg directly towards the net 180.deg directly away from the net)

oppo.movement.angle.1 - The angle the opponent runs from the start to the end of their current shot (o.deg directly towards the net 180.deg directly away from the net)

oppo.movement.angle.2 - The angle the opponent runs from the end of their current shot to the start of their next shot (o.deg directly towards the net 180.deg directly away from the net)

lag.oppo.height.off.net - Meters above net of opponents previous shot

lag.p.start.shot.speed - Speed at the start of the arc in m/s of the players previous shot

lag.oppo.start.shot.speed - Speed at the start of the arc in m/s of the opponents previous shot

Modelling

Visual analysis has been used to select a number of new variables from the rallies data set. For the two hidden state model ‘winner’ is the response variable. For the three hidden state model ‘winner.return.error’ is the response variable.

The covariates chosen after visual analysis are “p.start.position.x”, “p.start.position.y”, “oppo.start.position.x”, “oppo.start.position.y”, “p.movement.angle.1”, “p.diff.avg.shot.and.match.movement.speed”, “oppo.diff.avg.shot.and.match.movement.speed”, “diff.p.avg.and.current.shot.speed”, “lag.oppo.height.off.net”

These covariates have been scaled to between 0 and 1 for the analysis.

To identify the best covariates for predicting hidden states of many different players; training models have been created using a step-wise approach to identify the best combination of covariates in a chosen match data set. These training models have then been tested on a different data sets for a comparison of results. The limitation of this step wise approach is that some potentially important covariates have been ommited from the model because of correlation with other covariates. For example shot type, forehand/backhand etc. is correlated with the players x and y coordinates.

Training Models

For step-wise analysis, I began by running all the covariates to find the covariate which changes the log Likelihood of the model the most. I then added this covariate to the model and ran all the remaining covariates to find the next best covariate. I repeated this process until the model is no longer being improved through the addition of new covariates or until there are no more covariates.

The first training model(Training Model 1) has been made using step-wise for two hidden states for the match data Federer vs Berdych 2016. The best combination of covariates is: p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1

The second training model (Training Model 2) has been made using step-wise for two hidden states for the match data Federer vs Berdych 2016/17/18. The best combination of covariates is: p.movement.angle.1, oppo.diff.avg.shot.and.match.movement.speed, p.start.position.y, oppo.start.position.x, oppo.start.position.y, p.diff.avg.shot.and.match.movement.speed, lag.oppo.height.off.net

The third training model (Training Model 3) has been made using step-wise for three hidden states for the match data Strycova vs Garcia 2016/17. The best combination of covariates is: oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, oppo.start.position.y, diff.p.avg.and.current.shot.speed, p.start.position.y, p.diff.avg.shot.and.match.movement.speed, lag.oppo.height.off.net

Intepreting the output

2 state Model

The best two state model by comparing total change in -logLik is Training Model 2 tested on the match data Federer vs Berdych 2016/17/18.

In the attacking state for this model the probability of hitting a winner is 0.255.

In the returning state the probability of hitting a winner is 0.022.

Interpreting the coefficients of the model suggests that.

An increase in p.movement.angle.1 decreases the probability of hitting a winner. If Federer is running directly away from the net he is less likely to hit a winner.

An increase in oppo.diff.avg.shot.and.match.movement.speed increases the probability of hitting a winner. If the Berdych is running faster during the shot, Federer is more likely to hit a winner.

An increases in p.start.position.y decreases the probability of hitting a winner. As Federer moves further away from the centre of the court he is less likely to hit a winner.

An increases oppo.start.position.x decreases the probability of hitting a winner. Federer is more likely to hit a winner when Berdych starts closer to the net.

An increase in oppo.start.position.y decreases the probability of hitting a winner. As Berdych starts further away from the centre of the court Federer is less likely to hit a winner. This is the opposite of what we were expecting.

An Increase in p.diff.avg.shot.and.match.movement.speed decreases the probability of hitting a winner. If Federer is running faster to hit the shot, he is less likely to hit a winner.

An increase in lag.oppo.height.off.net increases the probability of hitting a winner. If Berdych’s shot passes higher over the net, then Federer is more likely to hit a winner.

The covariate with the greatest impact on Federer being in an attacking state is oppo.start.position.y This indicates that Federer has a high probability of hitting a winner if Berdych starts close to the net.

There were some issues with this model. Repeated running of the depmix model in this report can see the attacking state alternate between being defined as state 1 or state 2. The results remain the same for the probability of hitting a winner, but the corresponding states are flipped. We have to be careful in interpreting the results. The attacking state is the one which has a higher value for Re1.1.

library(depmixS4)
## Loading required package: nnet
## Loading required package: MASS
## Loading required package: Rsolnp
library(ggplot2)
#Data set 2, Training Model 2, 2 states.
ds2.mod2.2s <- depmix(winner ~ 1, transition = ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 2, family=multinomial("identity"))
 ds2.fm2.2s <- fit(ds2.mod2.2s)
## iteration 0 logLik: -193.3407 
## iteration 5 logLik: -191.8366 
## iteration 10 logLik: -187.6157 
## iteration 15 logLik: -183.4945 
## iteration 20 logLik: -181.7833 
## iteration 25 logLik: -180.798 
## iteration 30 logLik: -179.9897 
## iteration 35 logLik: -179.4552 
## iteration 40 logLik: -179.2184 
## iteration 45 logLik: -179.112 
## iteration 50 logLik: -178.9808 
## iteration 55 logLik: -178.5362 
## iteration 60 logLik: -177.5992 
## iteration 65 logLik: -176.8534 
## iteration 70 logLik: -176.2929 
## iteration 75 logLik: -175.8031 
## iteration 80 logLik: -175.392 
## iteration 85 logLik: -175.0764 
## iteration 90 logLik: -174.8127 
## iteration 95 logLik: -174.5302 
## iteration 100 logLik: -174.1384 
## iteration 105 logLik: -173.5179 
## iteration 110 logLik: -172.5598 
## iteration 115 logLik: -171.8537 
## iteration 120 logLik: -171.3753 
## iteration 125 logLik: -170.8761 
## iteration 130 logLik: -170.339 
## iteration 135 logLik: -169.9635 
## iteration 140 logLik: -169.5568 
## iteration 145 logLik: -169.3549 
## iteration 150 logLik: -169.227 
## iteration 155 logLik: -169.135 
## iteration 160 logLik: -169.0844 
## iteration 165 logLik: -169.0464 
## iteration 170 logLik: -169.0023 
## iteration 175 logLik: -168.9841 
## iteration 180 logLik: -168.9591 
## iteration 185 logLik: -168.938 
## iteration 190 logLik: -168.9197 
## converged at iteration 193 with logLik: -168.9193
 summary(ds2.fm2.2s)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0  151.81686
## p.movement.angle.1                            0   52.65853
## oppo.diff.avg.shot.and.match.movement.speed   0 -222.42661
## p.start.position.y                            0   15.27117
## oppo.start.position.x                         0 -232.38970
## oppo.start.position.y                         0  111.40105
## p.diff.avg.shot.and.match.movement.speed      0  325.01427
## lag.oppo.height.off.net                       0  356.70799
## Probalities at zero values of the covariates.
## 1.166201e-66 1 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0 -52.064193
## p.movement.angle.1                            0   6.128489
## oppo.diff.avg.shot.and.match.movement.speed   0  -3.720033
## p.start.position.y                            0   5.008703
## oppo.start.position.x                         0  58.532686
## oppo.start.position.y                         0   3.594083
## p.diff.avg.shot.and.match.movement.speed      0   5.419555
## lag.oppo.height.off.net                       0 -15.735508
## Probalities at zero values of the covariates.
## 1 2.447982e-23 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1 0.745 0.255
## St2 0.978 0.022

3 State Models

In attempting to create a three state model:

Firstly I ran Training Model 1 and Training Model 2 as three state models with ‘winner.return.error’ as the response variable. This was unsuccessful in identifying three states, at the zero values of the covariates only two states can be clearly identified.

I then ran the three state models with the p.advantage as the second response variable. The reasoning for this is to observe the intercept of player advantage in different states and to identify players who try to hit a winner when at a disadvantage. This may provide more insight into player shot intentions.

I created a new three state test model with these two response variables for the data set Strycova vs Garcia 2016/17. For the model summaries, at the zero values of the covariates, three clear states can be seen. However with repeated running of these models, the results are not repeatable. The models can be viewed in the appendix.

The three state model may be useful for identifying the probability of the player hitting a winner or an error from an attacking, returning or a defensive state. A larger probability of hitting a winner from a defensive state compared to other players may be an indicator of player risk taking.

We can see from the plots below that players are hitting winners when at a disadvantage. For the purpose of interpreting the intercept of player advantage as a response variable, I left the data unscaled when running the model.

#Plotting proportion of winners by player advantage. 
ggplot(fed16_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2016 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed17_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2017 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed18_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2018 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(stry16_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Strycova 2016 Winners by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Building an Animation of Rallies

Using the package GGAnimate and a top-down court outline taken from “https://github.com/mvparrot/vis-serve/blob/master/report file helper basic plots”. The variables p.start.position.x, p.end.position.x, oppo.start.position.x, oppo.end.position.x, p.start.position.y, p.end.position.y, oppo.start.position.y, oppo.end.position.y, start.x, start.y, projected.ballmark.x and projected.ballmark.y

transition_reveal has been used for filling in gaps in the player and ball position data. To run transition_reveal, a single column was created for the player’s start & end position x coordinates. They are grouped by rally number and shot number and alternate between start and end position. The same thing is done for the player’s position y coordinates. As well as for the opponent’s start and end position x and y coordinates. For animating the ball position, similar columns are created alternating between the start position of the ball and the projected ballmark for both x and y coordinates.

Visualising player state change in the animation

We can take the implied states from the Depmix Model probs and overlay them into the animation to visualise when the player is in an attacking or returning state. In this case we are looking at the longest rally between Federer and Berdych in 2016. Federer’s icon will get larger to very clearly indicate when he is an attacking state. State 2 indicates the player being in an attacking state and State 1 a returning state.

For this example I have taken the state probabilities from the Fitted Depmix model for Federer vs Berdych 2016 and created a vector of implied states for the longest rally. I have defined Federer’s implied states when Berdych is hitting the ball as state 1. As we are running the animation using columns that contain start and end positions, the player state for each shot is duplicated for start and end position.

library(gganimate)
library(tweenr)
## Warning: package 'tweenr' was built under R version 3.5.2
library(transformr)
## Warning: package 'transformr' was built under R version 3.5.2
#Top Down Court View
#
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:MASS':
## 
##     select
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
#--- Packages Required
# require(ggplot2)
# require(plotly)

#--- Outline of the court
court_trace <- data.frame(x = c(-11.89, -11.89, 0, 0, 0, 11.89, 11.89, -11.89, -11.89, 11.89, 11.89, -11.89, -6.4, -6.4, 6.4, 6.4, 6.4, -6.4),
                          y = c(5.49, -5.49, -5.49, 5.49, -5.49, -5.49, 5.49, 5.49, 4.115, 4.115, -4.115, -4.115, -4.115, 4.115, 4.115, -4.115, 0, 0),
                          z = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
net_trace <- data.frame(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                        y = c(-5.49,-5.49, -6.4, -6.4, -5.49, 0, 5.49, 6.4, 6.4, 5.49, 5.49),
                        z = c(1.07, 0, 0, 1.07, 1.07, 0.914, 1.07, 1.07, 0, 0, 1.07))
service_trace <- data.frame(x = c(-8, 0, 0, 0, -6.4, -6.4, 0, -6.4, -6.4, -6.4, -6.4, -6.4,  0, 0, -8),
                            y = c(-5.49, -5.49, -4.115, 4.115, 4.115, 0, 0, 0, -4.115, -5.49, 5.49, -4.115, -4.115, 5.49, 5.49),
                            z = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ,0 ,0))
    
axis_labels <- data.frame(x.break = c(-21.89:-11.89, -6.4, 0, 6.4, 11.89),
                          x.label = c("-10m","","","","","-5m","","","","",
                                      "Baseline","Service Line","Net","Service Line","Baseline"),
                          y.break = c(-5.49,-4.115,0,4.115,5.49),
                          y.label = c("Doubles", "Singles","Centre","Singles","Doubles"),
                          z.break = c(0,0.992,2,3,4),
                          z.label = c("Ground", "Net", "2m", "3m", "4m"))
#--- Top down court view
court_topdown <- ggplot() + 
    labs(x = "x direction", y = "y direction") + 
    scale_x_continuous(breaks = axis_labels$x.break,
                       labels = axis_labels$x.label) +
    scale_y_continuous(breaks = axis_labels$y.break,
                       labels = axis_labels$y.label) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed()
p.state <- c(1,1,1,1,1,1,1,1,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,1,1,2,2)
p.state <- as.data.frame(p.state)

fed16_longest <- cbind(p.state, fed16_longest)

p.rally.anim10 <- ggplot() + 
  scale_x_continuous(breaks = axis_labels$x.break) +
    scale_y_continuous(breaks = axis_labels$y.break) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed() +
  geom_point(data= fed16_longest, 
             aes(x=p.start.position.x, y=p.start.position.y,  group = rally.number, colour="red", size = p.state)) +
  geom_path(data= fed16_longest, 
            aes(x=p.start.position.x, y=p.start.position.y,  group = rally.number, alpha = shot)) +
  geom_point(data= fed16_longest, aes(x=oppo.start.position.x, y=oppo.start.position.y,  
                                        group = rally.number, colour="blue")) +
  geom_path(data= fed16_longest,aes(x=oppo.start.position.x, y=oppo.start.position.y,  
                                      group = rally.number, alpha = shot))+
  geom_point(data= fed16_longest, 
             aes(x=start.x, y=start.y, group = rally.number, colour="green")) +
  transition_reveal(pos.rally.count) +
  xlab("Player Position X ") + 
  ylab("Player Position Y") + 
  ggtitle("Federer 2016 Longest Rally Animation")+
  scale_colour_manual(name = "", values=c("red","green","blue"), labels= c("Berdych","Ball","Federer"))

#had to reverse geom_point colour labels for some reason

animate(p.rally.anim10, duration = 30, fps = 10)
## Warning in max(frame): no non-missing arguments to max; returning -Inf

## Warning in max(frame): no non-missing arguments to max; returning -Inf

Choosing variables for the model

Confirming choices of covariates through visualising covariates relationship to the response variable.

Player Start Position Coordinates

p.start.position.x- When compared to similar start position x coordinates, Federer appears more likely to hit a winner as he starts the shot closer to the net. Choosing this variable for HMM as we might expect that the probability of Federer being in an attacking state increases as he moves closer to the net( x value increases).

p.start.position.y- Choosing this variable for the HMM as it makes the players start position x coordinates more meaningful when included in the model. We might expect that the probability of Federer being in an attacking state increases as he moves closer to the centre(y value decreases) and to the net. Using absolute value to measure distance from centre.

ggplot(fed16_only_plots, aes(x=p.start.position.x,  y=p.start.position.y ))    + 
  geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
  geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
  coord_fixed() +
  geom_point(aes(alpha = winner))  +   
  xlab("Federer Start Position X ") + ylab("Federer Start Position Y") +
  ggtitle("Federer winners by start position")

ggplot(fed16_only, aes(x=p.start.position.x,winner)) + geom_smooth() + xlab("X Coordinate of Player") + ylab("Proportion of Winners") + ggtitle("Federer Winners By X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only, aes(x=p.start.position.y,winner)) + geom_smooth() + xlab("Distance from Centre of Player") + ylab("Proportion of Winners") + ggtitle("Federer Winners By Distance from Centre")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Player Shot Position Coordinates

p.start.shot.x- Similar to Federer’s start position x coordinates, the x coordinates of his shot hit point appears more likely to result in a winner compared to similarly distanced shots as he moves closer to the net.

p.start.shot.y- Including in HMM to make shot x coordinates more meaningful. Using absolute to measures distance from centre.

ggplot(fed16_only_plots, aes(x=p.start.shot.x,  y=p.start.shot.y ))    + 
  geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
  geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
  coord_fixed() +
  geom_point(aes(alpha = winner))  +   
  xlab("Shot Position X ") + ylab("Shot Position Y") +
  ggtitle("Federer winners by shot location")

opponent Start Position Coordinates

oppo.start.position.x- Federer appears more likely to hit a winner as Berdych start position x coordinates are closer to the net. (As we are analysing the rallies with all of Berdych’s having positive coordinates in the HMM we would expect a decrease in Berdych’s start position x coordinates to increase The probability of Federer being in an attacking state. oppo.start.position.y

ggplot(fed16_only_plots, aes(x=oppo.start.position.x,  y=oppo.start.position.y ))    + 
  scale_x_continuous(breaks = axis_labels$x.break) +
  scale_y_continuous(breaks = axis_labels$y.break) +
  geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
  geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
  coord_fixed() +
  geom_point(aes(alpha = winner))  +   
  xlab("Opponent Start Position X ") + ylab("Opponent Start Position Y") +
  ggtitle("Federer winners by Opponent Start Position")

ggplot(fed16_only, aes(x=oppo.start.position.x,winner)) + geom_smooth() + xlab("X Coordinate of Opponent") + ylab("Proportion of Federer Winners") + ggtitle("Federer Winners By Opponent X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only, aes(x=oppo.start.position.y,winner)) + geom_smooth() + xlab("Distance from Centre of Opponent") + ylab("Proportion of Winners") + ggtitle("Federer Winners by Opponent Distance from Centre")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Difference in player movement speed from the match average

p.diff.avg.shot.and.match.movement.speed - When Federer runs approximately more than 1m/s above or below his average movement speed for match he tends to hit more winners. This variable therefore appears suitable for the HMM. (Using an absolute value of the difference we would expect an increase in difference to increase the probability of Federer being in a attacking state. (add in more info) When Considering winners, errors and returns we move back to relative difference as Federer tends to hit more errors than winners when moving approximately 1m/s slower than his average match movement speed

ggplot(fed16_only_plots, aes(x=factor(winner),y=p.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by difference in Federer's current and match avg movement speed")

ggplot(fed16_only_plots, aes(x=factor(winner),y=p.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by difference in Federer's current and match avg movement speed")

Difference in opponent movement speed from the match average

oppo.diff.avg.shot.and.match.movement.speed - Federer tends to hit more winners as Berdych runs more than 1m/s faster than his average match movement speed. This is a suitable variable for the HMM we would expect an increase in the difference to increase Federer’s probability of being in an attacking state.

ggplot(fed16_only_plots, aes(x=factor(winner),y=oppo.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by difference in Opponent current/match avg movement speed")

ggplot(fed16_only_plots, aes(x=factor(winner),y=oppo.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by difference in Opponent current/match avg movement speed")

###Number of times the player changes side during the rally p.rally.side.change.count - Not suitable

ggplot(fed16_only_plots, aes(x=p.rally.side.change.count, y=winner)) + geom_bar(stat="identity") + xlab("Rally Side Change Count") + ylab("Proportion of Winners") + ggtitle("Winners By Number of Times Federer Changes Side in Rally")

Number of times the opponent changes sides during the rally

oppo.rally.side.change.count - Might be suitable when looking at a greater number of rallies.

ggplot(fed16_only_plots, aes(x=oppo.rally.side.change.count, y=winner)) + geom_bar(stat="identity") + xlab("Rally Side Change Count") + ylab("Proportion of Winners") + ggtitle("Winners By Number of Times Berdych Changes Side in Rally")

The angle the player runs from the starrt to the end of the shot

p.movement.angle.1 - more than 75% of Federer’s Winners come when he is running less than 100 degrees in relation to the baseline. This variable appears suitable for HMM we would expect as Federer’s movement angle increases beyond 100 degrees the probability the he is an attacking state decreases.

ggplot(fed16_only_plots, aes(x=p.movement.angle.1,y=winner)) + geom_smooth() + xlab("Angle Made By Federer's movement") + ylab("Proportion of Winners") + ggtitle("Winners By Federer's start to end movement angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only, aes(x=factor(winner),y=p.movement.angle.1, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Federer's start to end movement angle")

ggplot(fed16_only, aes(x=factor(winner),y=p.movement.angle.1, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by Federer's start to end movement angle")

# ggplot(fed16_only, aes(x=factor(winner.return.error),y=p.movement.angle.1, fill=factor(winner.return.error))) + geom_violin(scale = "area") + ggtitle("Winner by Federer's start to end movement angle")

The angle the opponent runs from the start to the end of the shot

oppo.movement.angle.1 - Need more evidence to include in model

ggplot(fed16_only_plots, aes(x=oppo.movement.angle.1,y=winner)) + geom_smooth() + xlab("Angle Made By Berdych's movement") + ylab("Proportion of Winners") + ggtitle("Winners By Berdych's start to end movement angle during Federer's shot")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only, aes(x=factor(winner),y=oppo.movement.angle.1, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Berdych's start to end movement angle during Federer's shot")

ggplot(fed16_only, aes(x=factor(winner),y=oppo.movement.angle.1, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by Berdych's start to end movement angle during Federer's shot")

The height the opponent hit the ball over the net with their previous shot

lag.oppo.height.off.net - Federer tends to hit more winners when incoming shot approaches 50cm off the height of the net. This decreases as the height of the incoming shot increases or decreases. Including this variable in the HMM. Expect the coefficient to be small.

ggplot(fed16_only_plots, aes(x=lag.oppo.height.off.net,y=winner)) + geom_smooth() + xlab("Height of opponents shot") + ylab("Proportion of Winners") + ggtitle("Winners by Height over Net of Most recent Opponent shot")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

###The difference in players current shot speed from the match average diff.p.avg.and.current.shot.speed

ggplot(fed16_only, aes(x=factor(winner),y=diff.p.avg.and.current.shot.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by player current/average shot speed difference")

Appendix

The appendix contains:

-The Depmix models and visual analysis of the state probabilities of these models.

-The step-wise analysis to find the best covariates for the Depmix models.

-Additional visual analysis of variables.

Visualisations of Shot Patterns for Federer vs Berdych 2016

Here is a series of plots showing:

-Shot type by shot x/y coordinates

-A recreation of plots from the winter project on different match data.

#Shot Type by coordinates
ggplot(dlb_fedonly, aes(x=start.x,y=start.y)) + 
  scale_x_continuous(breaks = axis_labels$x.break) +
    scale_y_continuous(breaks = axis_labels$y.break) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed() +
  geom_point() + 
  xlab("X Coordinate") + 
  ylab("Y Coordinate") + 
  ggtitle("Federer Down the Line Backhands By Shot Co-Ordinates") 

ggplot(iof_fedonly, aes(x=start.x,y=start.y)) + 
  scale_x_continuous(breaks = axis_labels$x.break) +
    scale_y_continuous(breaks = axis_labels$y.break) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed() +
  geom_point() + 
  xlab("X Coordinate") + 
  ylab("Y Coordinate") + 
  ggtitle("Federer Inside out Forehands By Shot Co-Ordinates") 

ggplot(dlw_fedonly, aes(x=start.x,y=start.y)) + 
  scale_x_continuous(breaks = axis_labels$x.break) +
    scale_y_continuous(breaks = axis_labels$y.break) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed() +
  geom_point() + 
  xlab("X Coordinate") + 
  ylab("Y Coordinate") + 
  ggtitle("Federer Down the Line Winners By Shot Co-Ordinates")

# #Movement and shot position seperated by forehand and backhand
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=winner))  + stat_summary_hex(fun = function(winner) sum(winner)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer winners by start position seperated by shot type for long rallies")
# 
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=winner))  + stat_summary_hex(fun = function(winner) sum(winner)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer winners by end position seperated by shot type for long rallies")
# 
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=is.good))  + stat_summary_hex(fun = function(is.good) sum(is.good)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer errors by start position seperated by shot type for long rallies")
# 
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=is.good))  + stat_summary_hex(fun = function(is.good) sum(is.good)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer errors by end position seperated by shot type for long rallies")
# 
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=time.to.net))  + stat_summary_hex(fun = function(time.to.net) sum(time.to.net)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer shot time to net by start position seperated by shot type for long rallies")
# 
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=time.to.net))  + stat_summary_hex(fun = function(time.to.net) sum(time.to.net)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer shot time to net by end position seperated by shot type for long rallies")
#Visulaising speed ratio

ggplot(fed16_df, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For match")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

ggplot(fed16_only, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Fed Only Speed ratio by distance from the baseline For match")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(longral_df, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For Long Rallies")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_longest, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For Longest Rally")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Visualising player shot angles

ggplot(fed16_df, aes(x=base.dist,y=p.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Federers Shot") + ggtitle("Federer Shot Angle by distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

ggplot(fed16_only, aes(x=base.dist,y=p.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Federers Shot") + ggtitle("Fed only Shot Angle by distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_df, aes(x=base.dist,y=o.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Berdych previous Shot") + ggtitle("Berdych Shot Angle by Federer distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 219 rows containing non-finite values (stat_smooth).

##Plots of shot and ballmark co-ordinates, angles made by shots and speed 
library(ggplot2)
#X Co-ordinate of shot
ggplot(fed16_only, aes(x=start.x,winner)) + geom_smooth() + xlab("X Coordinate of Shot") + ylab("Proportion of Winners") + ggtitle("Winners By X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#Opponents shot before
library(hexbin)
ggplot(fed16_only, aes(x=oppo.hit.x,y=oppo.hit.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate of Oppo Shot") + ylab("Y Coordinate of Oppo Shot") + ggtitle("Count of Winners By Opponent's Shot Co-Ordinates")

### Get help transforming this to proportion in each bin instead of raw count

#Fed Shots on x-y plane
ggplot(fed16_only, aes(x=start.x,y=start.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate") + ylab("Y Coordinate") + ggtitle("Winners By Shot Co-Ordinates")

#Fed Ballmark by winner in x-y plane
ggplot(fed16_only, aes(x=projected.ballmark.x,y=projected.ballmark.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate") + ylab("Y Coordinate") + ggtitle("Winners By Shot Ballmark Co-Ordinates")

#Angle of shots
ggplot(fed16_only, aes(x=o.angle,y=winner)) + geom_smooth() + xlab("Angle Made By Opponents Shot With Fed's Previous Shot") + ylab("Proportion of Winners") + ggtitle("Winners By Opponent Shot Angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only, aes(x=p.angle,y=winner)) + geom_smooth() + xlab("Angle Made By Federer's Shot") + ylab("Proportion of Winners") + ggtitle("Winners By Federer's Shot Angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(fed16_only) + geom_density(aes(p.angle,group=factor(winner),color=factor(winner))) + ggtitle("Density of Winners by Fed Shot Angles")

ggplot(fed16_only) + geom_density(aes(o.angle,group=factor(winner),color=factor(winner))) + ggtitle("Density of Winners by Opponent Shot Angles")

#Winners by oppo speed
ggplot(fed16_only) + geom_density(aes(oppo.speed, group = factor(winner), color=factor(winner)))

ggplot(fed16_only, aes(x=factor(winner),y=oppo.speed,fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Opponent Speed")

#Winners by speed ratio
ggplot(fed16_only) + geom_density(aes(speed.ratio, group = factor(winner), color=factor(winner)))

ggplot(fed16_only, aes(x=factor(winner),y=speed.ratio,fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Speed Ratio")

##Depmix Models All of the depmix models run. Some three state models and step-wise analysis have been commented out because of causing issues with the knit. The code can still be run in Rmarkdown by uncommenting. Examined three data sets: Data Set One: Federer vs Berdych 2016 Data Set Two: Federer vs Berdych 2016/17/17 Data Set Three: Strycova vs Garcia 2016/17 ###Federer 2016 Data Set 1 Training Model 1 for Data Set 1 (Federer vs Berdych 2016) Training Model 1 is the best combination of variables found from running step-wise on Data Set 1

Running Data Set 1 Training Model 1, (2 states)

# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
 ds1.mod1.2s <- depmix(winner ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed16_scale, nstates = 2, family=multinomial("identity"))
 ds1.fm1.2s <- fit(ds1.mod1.2s)
## iteration 0 logLik: -77.52065 
## iteration 5 logLik: -75.19778 
## iteration 10 logLik: -71.93352 
## iteration 15 logLik: -70.06745 
## iteration 20 logLik: -68.04245 
## iteration 25 logLik: -66.61357 
## iteration 30 logLik: -65.83822 
## iteration 35 logLik: -64.32856 
## iteration 40 logLik: -60.65898 
## iteration 45 logLik: -59.343 
## iteration 50 logLik: -59.13124 
## iteration 55 logLik: -59.03431 
## iteration 60 logLik: -58.93476 
## iteration 65 logLik: -58.86841 
## iteration 70 logLik: -58.82553 
## iteration 75 logLik: -58.78111 
## iteration 80 logLik: -58.65491 
## iteration 85 logLik: -58.06383 
## iteration 90 logLik: -56.25788 
## iteration 95 logLik: -55.54676 
## iteration 100 logLik: -55.08047 
## iteration 105 logLik: -54.80902 
## iteration 110 logLik: -54.53245 
## iteration 115 logLik: -54.35062 
## iteration 120 logLik: -54.20976 
## iteration 125 logLik: -54.14172 
## iteration 130 logLik: -54.02673 
## converged at iteration 134 with logLik: -54.01444
 summary(ds1.fm1.2s)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1         St2
## (Intercept)                                   0  31.0832797
## p.start.position.x                            0 108.7561342
## lag.oppo.height.off.net                       0 -99.0752111
## oppo.diff.avg.shot.and.match.movement.speed   0   1.5101516
## oppo.start.position.x                         0 -28.7908402
## diff.p.avg.and.current.shot.speed             0 -14.1050364
## p.movement.angle.1                            0   0.7189997
## Probalities at zero values of the covariates.
## 3.167402e-14 1 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0 -154.34203
## p.start.position.x                            0 -648.26071
## lag.oppo.height.off.net                       0  364.63212
## oppo.diff.avg.shot.and.match.movement.speed   0 -125.03439
## oppo.start.position.x                         0  256.86055
## diff.p.avg.and.current.shot.speed             0   70.37259
## p.movement.angle.1                            0   63.54975
## Probalities at zero values of the covariates.
## 1 9.3349e-68 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1 0.609 0.391
## St2 1.000 0.000
#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
ds1.fm1.2s_df <- posterior(ds1.fm1.2s)

ds1.fm1.2s_df <- cbind(fed16_scale, ds1.fm1.2s_df)
    
#dropping columns not in the fitted depmix model

 ds1.fm1.2s_df <- ds1.fm1.2s_df[ -c(2, 4, 6:8, 12:14) ]

Plotting Data Set 1 Training Model 1(2 States)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Plotting states for training model fed16
#p.start.position.x, oppo.start.position.x, oppo.diff.avg.shot.and.match.movement.speed, p.diff.avg.shot.and.match.movement.speed, lag.p.angle, oppo.rally.side.change.count, diff.lag.oppo.avg.and.current.shot.speed, diff.p.avg.and.current.shot.speed, p.movement.angle.1, lag.oppo.height.off.net, oppo.start.position.y

probs = posterior(ds1.fm1.2s)

  
 
    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2'),  fill=1:2, bty='n')

ggplot() + 
  geom_path(data=fed16_scale, aes(x = player.total.shot.number, y = winner)) +
  xlab("Federer shot Number")  + 
  ylab("Actual State") + 
  ggtitle("Actual state")   

#plotting a matrix to identify any correlation between depmix probabilities and chosen variables

library(GGally)
## Warning: package 'GGally' was built under R version 3.5.2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
ggduo(ds1.fm1.2s_df, 1:3, 8, showStrips = FALSE)

ggduo(ds1.fm1.2s_df, 4:5, 8, showStrips = FALSE)

#showing correlation between variables and probaility of being in state 1

cor.ds1.fm1.2s <- cor(ds1.fm1.2s_df)
cor.ds1.fm1.2s_df <- cor.ds1.fm1.2s[ -c(1:5, 7) ]

Running Data Set 1, Training Model 1 (3 states)

# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
 ds1.mod1.3s <- depmix(winner.return.error ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed16_scale, nstates = 3, family=multinomial("identity"))
 ds1.fm1.3s <- fit(ds1.mod1.3s)
## iteration 0 logLik: -128.72 
## iteration 5 logLik: -125.902 
## iteration 10 logLik: -118.5809 
## iteration 15 logLik: -107.8256 
## iteration 20 logLik: -98.11433 
## iteration 25 logLik: -84.66842 
## iteration 30 logLik: -77.11373 
## iteration 35 logLik: -74.07867 
## iteration 40 logLik: -72.1316 
## iteration 45 logLik: -71.34158 
## iteration 50 logLik: -70.93945 
## iteration 55 logLik: -70.75966 
## iteration 60 logLik: -70.69247 
## iteration 65 logLik: -70.66245 
## iteration 70 logLik: -70.64947 
## iteration 75 logLik: -70.64291 
## iteration 80 logLik: -70.63956 
## iteration 85 logLik: -70.63843 
## iteration 90 logLik: -70.63677 
## iteration 95 logLik: -70.63634 
## iteration 100 logLik: -70.63633 
## iteration 105 logLik: -70.63632 
## iteration 110 logLik: -70.63631 
## converged at iteration 114 with logLik: -70.6363
 summary(ds1.fm1.3s)
## Initial state probabilties model 
## pr1 pr2 pr3 
##   0   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1       St2         St3
## (Intercept)                                   0  189.3291 -333.198812
## p.start.position.x                            0 -273.7255 -242.617301
## lag.oppo.height.off.net                       0 -298.4381  228.919551
## oppo.diff.avg.shot.and.match.movement.speed   0  270.0946  288.105835
## oppo.start.position.x                         0 -192.8203  191.904321
## diff.p.avg.and.current.shot.speed             0 -146.0170   -5.160971
## p.movement.angle.1                            0  177.8477  296.384584
## Probalities at zero values of the covariates.
## 5.962609e-83 1 1.172278e-227 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1         St2         St3
## (Intercept)                                   0   30.495955   51.354835
## p.start.position.x                            0 -147.470327    5.550402
## lag.oppo.height.off.net                       0  151.587286 -124.041505
## oppo.diff.avg.shot.and.match.movement.speed   0   -1.920757  -42.669313
## oppo.start.position.x                         0  -15.909155   81.835023
## diff.p.avg.and.current.shot.speed             0  -11.764842  -10.982834
## p.movement.angle.1                            0  -24.727070  -39.510764
## Probalities at zero values of the covariates.
## 4.975978e-23 8.731796e-10 1 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0 -164.59080 -116.87490
## p.start.position.x                            0  110.77509   54.85293
## lag.oppo.height.off.net                       0  -43.54872  -20.88380
## oppo.diff.avg.shot.and.match.movement.speed   0  -30.00681  -46.70615
## oppo.start.position.x                         0  342.23172  308.58827
## diff.p.avg.and.current.shot.speed             0 -131.28350 -144.98309
## p.movement.angle.1                            0  -62.06163  -42.12094
## Probalities at zero values of the covariates.
## 1 3.304645e-72 1.745328e-51 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.0.5 Re1.1
## St1 0.257   0.727 0.016
## St2 0.000   0.279 0.721
## St3 0.000   1.000 0.000

Plotting Data Set 1 Training Model 1 (3 states)

#plotting Data Set 1 Training Model 1 with 3 states
probs = posterior(ds1.fm1.3s)

    
    
    plot((probs$state), type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2', "State2"),  fill=1:3, bty='n')

ggplot() + 
  geom_path(data=fed16_scale, aes(x = player.total.shot.number, y = winner.return.error)) +
  xlab("Federer shot Number")  + 
  ylab("Actual State") + 
  ggtitle("Actual state") 

###Federer Vs Berdych 2016/17/18 Data Set 2 Training Model two is the best combination of variables from running Step-wise on Data Set 2

Running Training Model 1 on Data Set 2 (multiple Fed vs Berdych matches 2016, 2017 & 2018)

Data Set 2 Training Model 1 (2 states)

# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
 ds2.mod1.2s <- depmix(winner ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed_scale, nstates = 2, family=multinomial("identity"))
 ds2.fm1.2s <- fit(ds2.mod1.2s)
## iteration 0 logLik: -193.5052 
## iteration 5 logLik: -193.453 
## iteration 10 logLik: -193.0896 
## iteration 15 logLik: -191.1532 
## iteration 20 logLik: -186.3481 
## iteration 25 logLik: -183.0758 
## iteration 30 logLik: -182.2625 
## iteration 35 logLik: -181.9671 
## iteration 40 logLik: -181.6846 
## iteration 45 logLik: -181.344 
## iteration 50 logLik: -180.9735 
## iteration 55 logLik: -180.6409 
## iteration 60 logLik: -180.396 
## iteration 65 logLik: -180.2425 
## iteration 70 logLik: -180.1479 
## iteration 75 logLik: -180.0808 
## iteration 80 logLik: -180.0245 
## iteration 85 logLik: -179.9722 
## iteration 90 logLik: -179.9209 
## iteration 95 logLik: -179.8688 
## iteration 100 logLik: -179.8142 
## iteration 105 logLik: -179.7555 
## iteration 110 logLik: -179.6907 
## iteration 115 logLik: -179.6175 
## iteration 120 logLik: -179.5334 
## iteration 125 logLik: -179.4366 
## iteration 130 logLik: -179.3283 
## iteration 135 logLik: -179.2131 
## iteration 140 logLik: -179.0959 
## iteration 145 logLik: -178.9787 
## iteration 150 logLik: -178.8625 
## iteration 155 logLik: -178.7491 
## iteration 160 logLik: -178.6395 
## iteration 165 logLik: -178.5353 
## iteration 170 logLik: -178.4373 
## iteration 175 logLik: -178.3464 
## iteration 180 logLik: -178.263 
## iteration 185 logLik: -178.187 
## iteration 190 logLik: -178.1185 
## iteration 195 logLik: -178.0569 
## iteration 200 logLik: -178.0017 
## iteration 205 logLik: -177.9526 
## iteration 210 logLik: -177.909 
## iteration 215 logLik: -177.8704 
## iteration 220 logLik: -177.8362 
## iteration 225 logLik: -177.8062 
## iteration 230 logLik: -177.7797 
## iteration 235 logLik: -177.7565 
## iteration 240 logLik: -177.7361 
## iteration 245 logLik: -177.7183 
## iteration 250 logLik: -177.7027 
## iteration 255 logLik: -177.6891 
## iteration 260 logLik: -177.6773 
## iteration 265 logLik: -177.667 
## iteration 270 logLik: -177.6581 
## iteration 275 logLik: -177.6503 
## iteration 280 logLik: -177.6435 
## iteration 285 logLik: -177.6376 
## iteration 290 logLik: -177.6326 
## iteration 295 logLik: -177.6282 
## iteration 300 logLik: -177.6243 
## iteration 305 logLik: -177.621 
## iteration 310 logLik: -177.6182 
## iteration 315 logLik: -177.6157 
## iteration 320 logLik: -177.6136 
## iteration 325 logLik: -177.6118 
## iteration 330 logLik: -177.6102 
## iteration 335 logLik: -177.6088 
## iteration 340 logLik: -177.6077 
## iteration 345 logLik: -177.6066 
## iteration 350 logLik: -177.6057 
## iteration 355 logLik: -177.6049 
## iteration 360 logLik: -177.6042 
## iteration 365 logLik: -177.6037 
## iteration 370 logLik: -177.6032 
## iteration 375 logLik: -177.6027 
## iteration 380 logLik: -177.6024 
## iteration 385 logLik: -177.602 
## iteration 390 logLik: -177.6018 
## iteration 395 logLik: -177.6015 
## iteration 400 logLik: -177.6013 
## iteration 405 logLik: -177.6011 
## iteration 410 logLik: -177.601 
## iteration 415 logLik: -177.6008 
## iteration 420 logLik: -177.6007 
## iteration 425 logLik: -177.6006 
## iteration 430 logLik: -177.6005 
## iteration 435 logLik: -177.6005 
## iteration 440 logLik: -177.6004 
## iteration 445 logLik: -177.6003 
## iteration 450 logLik: -177.6003 
## iteration 455 logLik: -177.6002 
## iteration 460 logLik: -177.6002 
## iteration 465 logLik: -177.6002 
## iteration 470 logLik: -177.6002 
## iteration 475 logLik: -177.6001 
## iteration 480 logLik: -177.6001 
## iteration 485 logLik: -177.6001 
## iteration 490 logLik: -177.6001 
## iteration 495 logLik: -177.6001 
## iteration 500 logLik: -177.6001
 summary(ds2.fm1.2s)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0  5.3549862
## p.start.position.x                            0  1.9884892
## lag.oppo.height.off.net                       0  2.2153863
## oppo.diff.avg.shot.and.match.movement.speed   0 -2.7077086
## oppo.start.position.x                         0 -1.3917780
## diff.p.avg.and.current.shot.speed             0 -1.9346839
## p.movement.angle.1                            0 -0.6188697
## Probalities at zero values of the covariates.
## 0.004702319 0.9952977 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + 
##     oppo.start.position.x + diff.p.avg.and.current.shot.speed + 
##     p.movement.angle.1
## Coefficients: 
##                                             St1         St2
## (Intercept)                                   0  1.27833029
## p.start.position.x                            0 -0.10527171
## lag.oppo.height.off.net                       0 -0.07047395
## oppo.diff.avg.shot.and.match.movement.speed   0 -1.19928715
## oppo.start.position.x                         0  0.57717782
## diff.p.avg.and.current.shot.speed             0  0.61862092
## p.movement.angle.1                            0  1.44618426
## Probalities at zero values of the covariates.
## 0.2178346 0.7821654 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1     0     1
## St2     1     0

Plotting Data Set 2 Training Model 1 (2 states)

probs = posterior(ds2.fm1.2s)

    
    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2'),  fill=1:2, bty='n')

#Plotting probability states for test model fed16/17/18
#p.start.position.x, oppo.start.position.x, oppo.diff.avg.shot.and.match.movement.speed, p.diff.avg.shot.and.match.movement.speed, lag.p.angle, oppo.rally.side.change.count, diff.lag.oppo.avg.and.current.shot.speed, diff.p.avg.and.current.shot.speed, p.movement.angle.1, lag.oppo.height.off.net, oppo.start.position.y

#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
ds2.fm1.2s_df <- posterior(ds2.fm1.2s)

ds2.fm1.2s_df <- cbind(fed_scale, ds2.fm1.2s_df)
    
#dropping columns not in the fitted depmix model

# ds2.fm1.2s_df <- ds2.fm1.2s_df[ -c(2, 6) ]
#plotting a matrix to identify any correlation between depmix probabilities and chosen variables
library(GGally)
# ggduo(ds2.fm1.2s_df, 1:3, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s_df, 4:6, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s_df, 7:9, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s, 10:11, 14, showStrips = FALSE)
#showing correlation between variables and probaility of being in state 1

# cor.ds2.fm1.2s <- cor(ds2.fm1.2s_df)
# cor.ds2.fm1.2s_df <- as.data.frame(apply(cor.ds2.fm1.2s, 2, function(x) ifelse (abs(x) >=0-1,x,"NA")))
# cor.ds2.fm1.2s_df <- cor.ds2.fm1.2s_df[ -c(1:13, 15) ]

Running Training Model 2 on Data set 2

Data Set 2 Training Model 2 (2 states)

#Training Model 2 has the best variables from stepwise run on Data Set 2
#p.movement.angle.1, oppo.diff.avg.shot.and.match.movement.speed, p.start.position.y, oppo.start.position.x, oppo.start.position.y, p.diff.avg.shot.and.match.movement.speed,  lag.oppo.height.off.net

#running this model on the scaled fed_only data set
ds2.mod2.2s <- depmix(winner ~ 1, transition = ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 2, family=multinomial("identity"))
 ds2.fm2.2s <- fit(ds2.mod2.2s)
## iteration 0 logLik: -193.5032 
## iteration 5 logLik: -193.4286 
## iteration 10 logLik: -192.9374 
## iteration 15 logLik: -190.6037 
## iteration 20 logLik: -185.7654 
## iteration 25 logLik: -182.6925 
## iteration 30 logLik: -181.4189 
## iteration 35 logLik: -180.5176 
## iteration 40 logLik: -179.7856 
## iteration 45 logLik: -179.357 
## iteration 50 logLik: -179.1796 
## iteration 55 logLik: -179.0803 
## iteration 60 logLik: -178.8923 
## iteration 65 logLik: -178.2403 
## iteration 70 logLik: -177.312 
## iteration 75 logLik: -176.6445 
## iteration 80 logLik: -176.1105 
## iteration 85 logLik: -175.6443 
## iteration 90 logLik: -175.2695 
## iteration 95 logLik: -174.9793 
## iteration 100 logLik: -174.7172 
## iteration 105 logLik: -174.4061 
## iteration 110 logLik: -173.9563 
## iteration 115 logLik: -173.1872 
## iteration 120 logLik: -172.2315 
## iteration 125 logLik: -171.6512 
## iteration 130 logLik: -171.1833 
## iteration 135 logLik: -170.6749 
## iteration 140 logLik: -170.2238 
## iteration 145 logLik: -169.744 
## iteration 150 logLik: -169.4614 
## iteration 155 logLik: -169.2892 
## iteration 160 logLik: -169.192 
## iteration 165 logLik: -169.1299 
## iteration 170 logLik: -169.0759 
## iteration 175 logLik: -169.0327 
## iteration 180 logLik: -168.9982 
## iteration 185 logLik: -168.9677 
## iteration 190 logLik: -168.9502 
## iteration 195 logLik: -168.9239 
## iteration 200 logLik: -168.9185 
## converged at iteration 201 with logLik: -168.9185
 summary(ds2.fm2.2s)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0  151.96159
## p.movement.angle.1                            0   52.74108
## oppo.diff.avg.shot.and.match.movement.speed   0 -222.72829
## p.start.position.y                            0   15.31235
## oppo.start.position.x                         0 -232.65781
## oppo.start.position.y                         0  111.45715
## p.diff.avg.shot.and.match.movement.speed      0  325.63549
## lag.oppo.height.off.net                       0  357.17549
## Probalities at zero values of the covariates.
## 1.009064e-66 1 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2
## (Intercept)                                   0 -52.067848
## p.movement.angle.1                            0   6.134092
## oppo.diff.avg.shot.and.match.movement.speed   0  -3.716893
## p.start.position.y                            0   5.012778
## oppo.start.position.x                         0  58.534178
## oppo.start.position.y                         0   3.594212
## p.diff.avg.shot.and.match.movement.speed      0   5.419264
## lag.oppo.height.off.net                       0 -15.743651
## Probalities at zero values of the covariates.
## 1 2.439051e-23 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1 0.745 0.255
## St2 0.978 0.022

Plotting Probabilities Data Set 2 Training Model 2 (2 states)

probs = posterior(ds2.fm2.2s)

    
    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2'),  fill=1:2, bty='n')

#Plotting probability states for test model 2 for fed16/17/18


#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
#ds2.fm2.2s_df <- posterior(ds2.fm2.2s)
 
#ds2.fm2.2s_df <- cbind(fed_scale, ds2.fm2.2s_df)
    
#dropping columns not in the fitted depmix model

#ds2.fm2.2s_df <- ds2.fm2.2s_df[ -c() ]
#plotting a matrix to identify any correlation between depmix probabilities and chosen variables
# library(GGally)
# ggduo(ds2.fm2.2s_df, 1:3, 13, showStrips = FALSE)
# ggduo(ds2.fm2.2s_df, 4:6, 13, showStrips = FALSE)
#showing correlation between variables and probaility of being in state 1 for test model 2

# cor.ds2.fm2.2s <- cor(ds2.fm2.2s_df)
# cor.ds2.fm2.2s_df <- as.data.frame(apply(cor.ds2.fm2.2s, 2, function(x) ifelse (abs(x) >=-1,x,"NA")))
# cor.ds2.fm2.2s_df <- cor.ds2.fm2.2s[ -c() ]

`` ###Data Set 2 Training Model 2 (3 States)

#running this model on the scaled fed_only data set
ds2.mod2.3s <- depmix(winner.return.error ~ 1, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 3, family=multinomial("identity"))
 ds2.fm2.3s <- fit(ds2.mod2.3s)
## iteration 0 logLik: -362.5245 
## iteration 5 logLik: -362.0193 
## iteration 10 logLik: -360.1707 
## iteration 15 logLik: -356.5291 
## iteration 20 logLik: -353.289 
## iteration 25 logLik: -351.0103 
## iteration 30 logLik: -348.7662 
## iteration 35 logLik: -345.3797 
## iteration 40 logLik: -339.9362 
## iteration 45 logLik: -333.4647 
## iteration 50 logLik: -328.9034 
## iteration 55 logLik: -324.5113 
## iteration 60 logLik: -319.2431 
## iteration 65 logLik: -314.7265 
## iteration 70 logLik: -310.1458 
## iteration 75 logLik: -305.547 
## iteration 80 logLik: -302.403 
## iteration 85 logLik: -301.0839 
## iteration 90 logLik: -300.4038 
## iteration 95 logLik: -299.8475 
## iteration 100 logLik: -298.4973 
## iteration 105 logLik: -297.4937 
## iteration 110 logLik: -296.8352 
## iteration 115 logLik: -296.576 
## iteration 120 logLik: -296.3748 
## iteration 125 logLik: -296.2295 
## iteration 130 logLik: -296.1044 
## iteration 135 logLik: -296.0123 
## iteration 140 logLik: -295.9362 
## iteration 145 logLik: -295.857 
## iteration 150 logLik: -295.8008 
## iteration 155 logLik: -295.7101 
## iteration 160 logLik: -295.6686 
## iteration 165 logLik: -295.6201 
## iteration 170 logLik: -295.5736 
## iteration 175 logLik: -295.5364 
## iteration 180 logLik: -295.4914 
## iteration 185 logLik: -295.471 
## iteration 190 logLik: -295.4475 
## iteration 195 logLik: -295.4322 
## iteration 200 logLik: -295.4082 
## iteration 205 logLik: -295.3905 
## iteration 210 logLik: -295.3746
## Warning in em.depmix(object = object, maxit = emcontrol$maxit, tol =
## emcontrol$tol, : likelihood decreased on iteration 212
 summary(ds2.fm2.3s)
## Initial state probabilties model 
## pr1 pr2 pr3 
##   0   0   1 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1         St2        St3
## (Intercept)                                   0 -27.6037580  3.9045140
## p.movement.angle.1                            0  -7.3637171 -6.8871059
## oppo.diff.avg.shot.and.match.movement.speed   0 -21.0188105 -9.1958914
## p.start.position.y                            0   0.3896248  2.0294815
## oppo.start.position.x                         0  44.8326208 -0.7108344
## oppo.start.position.y                         0  -3.1670927 -7.2587288
## p.diff.avg.shot.and.match.movement.speed      0  26.2285021 14.9131106
## lag.oppo.height.off.net                       0 -26.2464468 13.5404045
## Probalities at zero values of the covariates.
## 0.01975271 2.029864e-14 0.9802473 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0  551.91342  400.36869
## p.movement.angle.1                            0  -87.96094 -153.00183
## oppo.diff.avg.shot.and.match.movement.speed   0  389.09377  473.90423
## p.start.position.y                            0  -83.04395  -94.30928
## oppo.start.position.x                         0 -454.92390 -220.24288
## oppo.start.position.y                         0 -605.43593 -244.06699
## p.diff.avg.shot.and.match.movement.speed      0 -574.22139 -716.03481
## lag.oppo.height.off.net                       0 -172.18652 -800.69210
## Probalities at zero values of the covariates.
## 2.027903e-240 1 1.530946e-66 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2       St3
## (Intercept)                                   0  279.49155 -485.0889
## p.movement.angle.1                            0   17.10067 -336.7601
## oppo.diff.avg.shot.and.match.movement.speed   0   60.95021  672.9418
## p.start.position.y                            0  -55.13964 -228.5750
## oppo.start.position.x                         0 -199.15758  190.5777
## oppo.start.position.y                         0   28.57099  132.4371
## p.diff.avg.shot.and.match.movement.speed      0 -740.81958 -128.0758
## lag.oppo.height.off.net                       0 -419.15452  147.1055
## Probalities at zero values of the covariates.
## 4.153003e-122 1 0 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.-1 Re1.0 Re1.1
## St1  0.063 0.937 0.000
## St2  0.355 0.568 0.077
## St3  0.000 0.628 0.372

Plotting Probabilities for Data Set 2 Training Model 2 (3 States)

#plotting Data Set 2 Training Model 2 with 3 states
probs = posterior(ds2.fm2.3s)

    
    
    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2', "State3"),  fill=1:3, bty='n')

ggplot() + 
  geom_path(data=fed_scale, aes(x = player.total.shot.number, y = winner.return.error)) +
  xlab("Federer shot Number")  + 
  ylab("Actual State") + 
  ggtitle("Actual state") 

Strycova

Comparing Different response variables to find the best model for observing three hidden states

#Response variable(s):winner.return.error & p.advantage.states
#Combination of the two response variables produces the best three state model
ds3.mod2a.3s <- depmix(list(winner.return.error ~ 1, p.advantage ~ 1), transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = stry_scale, nstates = 3, family=list(multinomial("identity"),gaussian()))
 ds3.fm2a.3s <- fit(ds3.mod2a.3s)
## iteration 0 logLik: -2163.841 
## iteration 5 logLik: -2139.626 
## iteration 10 logLik: -2120.41 
## iteration 15 logLik: -2107.574 
## iteration 20 logLik: -2099.112 
## iteration 25 logLik: -2093.206 
## iteration 30 logLik: -2090.919 
## iteration 35 logLik: -2089.829 
## iteration 40 logLik: -2089.127 
## iteration 45 logLik: -2088.5 
## iteration 50 logLik: -2087.509 
## iteration 55 logLik: -2085 
## iteration 60 logLik: -2084.042 
## iteration 65 logLik: -2083.734 
## iteration 70 logLik: -2083.489 
## iteration 75 logLik: -2083.165 
## iteration 80 logLik: -2082.928 
## iteration 85 logLik: -2082.765 
## iteration 90 logLik: -2082.648 
## iteration 95 logLik: -2082.533 
## iteration 100 logLik: -2082.429 
## iteration 105 logLik: -2082.301 
## iteration 110 logLik: -2082.231 
## iteration 115 logLik: -2082.152 
## iteration 120 logLik: -2082.083 
## iteration 125 logLik: -2082.034 
## iteration 130 logLik: -2081.985 
## iteration 135 logLik: -2081.941 
## iteration 140 logLik: -2081.907 
## iteration 145 logLik: -2081.871 
## iteration 150 logLik: -2081.843 
## iteration 155 logLik: -2081.806 
## iteration 160 logLik: -2081.762 
## iteration 165 logLik: -2081.724 
## iteration 170 logLik: -2081.694 
## iteration 175 logLik: -2081.67 
## iteration 180 logLik: -2081.648 
## iteration 185 logLik: -2081.631 
## iteration 190 logLik: -2081.609 
## iteration 195 logLik: -2081.598 
## iteration 200 logLik: -2081.577 
## iteration 205 logLik: -2081.546 
## iteration 210 logLik: -2081.285 
## iteration 215 logLik: -2081.179 
## iteration 220 logLik: -2081.088 
## iteration 225 logLik: -2081.055 
## iteration 230 logLik: -2081.037 
## iteration 235 logLik: -2081.025 
## iteration 240 logLik: -2081.009 
## iteration 245 logLik: -2080.997 
## iteration 250 logLik: -2080.983 
## iteration 255 logLik: -2080.978 
## iteration 260 logLik: -2080.974 
## iteration 265 logLik: -2080.969 
## iteration 270 logLik: -2080.966 
## iteration 275 logLik: -2080.963 
## iteration 280 logLik: -2080.96
## Warning in em.depmix(object = object, maxit = emcontrol$maxit, tol =
## emcontrol$tol, : likelihood decreased on iteration 285
 summary(ds3.fm2a.3s)
## Initial state probabilties model 
## pr1 pr2 pr3 
##   0   0   1 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0  16.503555 -548.47552
## p.movement.angle.1                            0  -1.894245   28.37007
## oppo.diff.avg.shot.and.match.movement.speed   0 -19.431435  255.83909
## p.start.position.y                            0 -34.381847  232.89193
## oppo.start.position.x                         0 -32.030999  973.50152
## oppo.start.position.y                         0  -5.903470  -22.03312
## p.diff.avg.shot.and.match.movement.speed      0  68.043103 -707.57265
## lag.oppo.height.off.net                       0 -21.301031 -252.42039
## Probalities at zero values of the covariates.
## 6.801384e-08 0.9999999 4.29247e-246 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0 -517.22839   69.83920
## p.movement.angle.1                            0   25.35463  202.61840
## oppo.diff.avg.shot.and.match.movement.speed   0  678.33056  185.04648
## p.start.position.y                            0   39.64384  -13.24857
## oppo.start.position.x                         0  180.06844  -62.63398
## oppo.start.position.y                         0 -189.42050  265.59466
## p.diff.avg.shot.and.match.movement.speed      0   25.65182 -601.73190
## lag.oppo.height.off.net                       0  335.76399 -141.28316
## Probalities at zero values of the covariates.
## 4.668978e-31 1.095941e-255 1 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0  124.11691  127.89515
## p.movement.angle.1                            0   25.51243   27.71095
## oppo.diff.avg.shot.and.match.movement.speed   0  651.00180  647.39193
## p.start.position.y                            0  576.86429  574.80466
## oppo.start.position.x                         0 -403.21554 -406.33130
## oppo.start.position.y                         0   78.93329   77.72840
## p.diff.avg.shot.and.match.movement.speed      0 -739.25360 -737.94967
## lag.oppo.height.off.net                       0 -101.71513  -98.28909
## Probalities at zero values of the covariates.
## 2.792703e-56 0.02235189 0.9776481 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
## Resp 2 : gaussian 
##     Re1.-1 Re1.0 Re1.1 Re2.(Intercept)   Re2.sd
## St1  0.150 0.850 0.000        1543.860  890.272
## St2  0.042 0.537 0.421        -199.877 3167.476
## St3  0.039 0.961 0.000        2235.783 2253.543
#Comparing Different response variables to find the best model for three states
#Response variable(s):p.advantage.states
ds3.mod2b.3s <- depmix(p.advantage ~ 1, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = stry_scale, nstates = 3, family=gaussian())
 ds3.fm2b.3s <- fit(ds3.mod2b.3s)
## iteration 0 logLik: -2036.25 
## iteration 5 logLik: -2014.021 
## iteration 10 logLik: -2008.674 
## iteration 15 logLik: -2003.685 
## iteration 20 logLik: -1996.694 
## iteration 25 logLik: -1985.815 
## iteration 30 logLik: -1974.985 
## iteration 35 logLik: -1973.506 
## iteration 40 logLik: -1972.948 
## iteration 45 logLik: -1971.01 
## iteration 50 logLik: -1969.78 
## iteration 55 logLik: -1969.657 
## iteration 60 logLik: -1969.611 
## iteration 65 logLik: -1969.478 
## iteration 70 logLik: -1969.15 
## iteration 75 logLik: -1969.069 
## iteration 80 logLik: -1969.059 
## iteration 85 logLik: -1969.057 
## iteration 90 logLik: -1969.053 
## iteration 95 logLik: -1969.051 
## iteration 100 logLik: -1969.047 
## iteration 105 logLik: -1969.035 
## iteration 110 logLik: -1968.997 
## iteration 115 logLik: -1968.912 
## iteration 120 logLik: -1968.819 
## iteration 125 logLik: -1968.776 
## iteration 130 logLik: -1968.764 
## iteration 135 logLik: -1968.761 
## iteration 140 logLik: -1968.76 
## iteration 145 logLik: -1968.759 
## iteration 150 logLik: -1968.759 
## iteration 155 logLik: -1968.758 
## iteration 160 logLik: -1968.758 
## iteration 165 logLik: -1968.757 
## iteration 170 logLik: -1968.757 
## iteration 175 logLik: -1968.757 
## iteration 180 logLik: -1968.757 
## iteration 185 logLik: -1968.756 
## iteration 190 logLik: -1968.756 
## iteration 195 logLik: -1968.756 
## iteration 200 logLik: -1968.756 
## converged at iteration 205 with logLik: -1968.755
 summary(ds3.fm2b.3s)
## Initial state probabilties model 
## pr1 pr2 pr3 
##   0   0   1 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0  139.39821 -254.88848
## p.movement.angle.1                            0 -115.88808  -62.42781
## oppo.diff.avg.shot.and.match.movement.speed   0 -347.04110  490.67592
## p.start.position.y                            0 -506.36715   75.10979
## oppo.start.position.x                         0  -35.30033  279.25123
## oppo.start.position.y                         0  474.60815 -176.74261
## p.diff.avg.shot.and.match.movement.speed      0  476.84632 -151.84694
## lag.oppo.height.off.net                       0 -184.41102 -296.33460
## Probalities at zero values of the covariates.
## 2.884877e-61 1 5.800541e-172 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0 -11.049503 -29.296036
## p.movement.angle.1                            0  -5.905984  -6.762788
## oppo.diff.avg.shot.and.match.movement.speed   0   7.513066  15.210323
## p.start.position.y                            0   4.956859  22.606392
## oppo.start.position.x                         0   7.115727   5.726038
## oppo.start.position.y                         0  -9.150120 -14.138686
## p.diff.avg.shot.and.match.movement.speed      0  23.278137  44.889011
## lag.oppo.height.off.net                       0  30.959022  41.754659
## Probalities at zero values of the covariates.
## 0.9999841 1.589479e-05 1.891849e-13 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2         St3
## (Intercept)                                   0  100.76728   19.144551
## p.movement.angle.1                            0  154.60749  -55.248317
## oppo.diff.avg.shot.and.match.movement.speed   0 -407.37454  216.140464
## p.start.position.y                            0   90.31802  -38.942860
## oppo.start.position.x                         0   83.73784  -95.512373
## oppo.start.position.y                         0  -66.54542   -2.704008
## p.diff.avg.shot.and.match.movement.speed      0 -320.32262 -267.781048
## lag.oppo.height.off.net                       0   80.32437  322.722007
## Probalities at zero values of the covariates.
## 1.727136e-44 1 3.562044e-36 
## 
## 
## Response parameters 
## Resp 1 : gaussian 
##     Re1.(Intercept)   Re1.sd
## St1        2993.544 2246.835
## St2        1430.823  808.396
## St3         130.665 2897.216
#Comparing Different response variables to find the best model for three states
#Response variable(s):winner.return.error & p.advantage.states

ds3.mod2c.3s <- depmix(winner.return.error ~ 1, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = stry_scale, nstates = 3, family=multinomial("identity"))
 ds3.fm2c.3s <- fit(ds3.mod2c.3s)
## iteration 0 logLik: -120.9393 
## iteration 5 logLik: -118.9352 
## iteration 10 logLik: -114.3997 
## iteration 15 logLik: -107.7044 
## iteration 20 logLik: -99.8285 
## iteration 25 logLik: -93.05565 
## iteration 30 logLik: -85.94239 
## iteration 35 logLik: -80.71336 
## iteration 40 logLik: -78.80156 
## iteration 45 logLik: -77.75581 
## iteration 50 logLik: -77.2472 
## iteration 55 logLik: -77.09472 
## iteration 60 logLik: -76.9857 
## iteration 65 logLik: -76.95726 
## iteration 70 logLik: -76.93676 
## iteration 75 logLik: -76.18698 
## iteration 80 logLik: -75.99201 
## iteration 85 logLik: -75.94287 
## iteration 90 logLik: -75.90082 
## iteration 95 logLik: -75.87796 
## iteration 100 logLik: -75.86786 
## iteration 105 logLik: -75.8636 
## iteration 110 logLik: -75.85922 
## iteration 115 logLik: -75.85739 
## iteration 120 logLik: -75.85621 
## iteration 125 logLik: -75.85519 
## iteration 130 logLik: -75.85441 
## iteration 135 logLik: -75.85337 
## iteration 140 logLik: -75.85273 
## iteration 145 logLik: -75.8516 
## iteration 150 logLik: -75.85114 
## iteration 155 logLik: -75.85061 
## iteration 160 logLik: -75.85019 
## iteration 165 logLik: -75.84973 
## iteration 170 logLik: -75.84947 
## iteration 175 logLik: -75.84918 
## iteration 180 logLik: -75.84896 
## iteration 185 logLik: -75.84865 
## iteration 190 logLik: -75.84825 
## iteration 195 logLik: -75.84798 
## iteration 200 logLik: -75.84757 
## iteration 205 logLik: -75.8473 
## iteration 210 logLik: -75.84704 
## iteration 215 logLik: -75.84672 
## iteration 220 logLik: -75.84639 
## iteration 225 logLik: -75.84602 
## iteration 230 logLik: -75.84562 
## iteration 235 logLik: -75.84517 
## iteration 240 logLik: -75.84467 
## iteration 245 logLik: -75.84408 
## iteration 250 logLik: -75.84347 
## iteration 255 logLik: -75.84283 
## iteration 260 logLik: -75.84221 
## iteration 265 logLik: -75.84158 
## iteration 270 logLik: -75.84101 
## iteration 275 logLik: -75.8405 
## iteration 280 logLik: -75.84005 
## iteration 285 logLik: -75.83967 
## iteration 290 logLik: -75.83935 
## iteration 295 logLik: -75.83909 
## iteration 300 logLik: -75.83885 
## iteration 305 logLik: -75.83867 
## iteration 310 logLik: -75.83854 
## iteration 315 logLik: -75.83843 
## iteration 320 logLik: -75.83834 
## iteration 325 logLik: -75.83821 
## iteration 330 logLik: -75.83814 
## iteration 335 logLik: -75.83807 
## iteration 340 logLik: -75.83802 
## iteration 345 logLik: -75.83796 
## iteration 350 logLik: -75.83794 
## iteration 355 logLik: -75.83781 
## iteration 360 logLik: -75.83779 
## converged at iteration 365 with logLik: -75.83778
 summary(ds3.fm2c.3s)
## Initial state probabilties model 
## pr1 pr2 pr3 
##   0   0   1 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2        St3
## (Intercept)                                   0  -28.16109  174.32047
## p.movement.angle.1                            0   16.26623  129.02541
## oppo.diff.avg.shot.and.match.movement.speed   0 -265.08124   44.93434
## p.start.position.y                            0  103.99391   76.80812
## oppo.start.position.x                         0  117.04577 -236.14064
## oppo.start.position.y                         0  -34.07272  -38.70078
## p.diff.avg.shot.and.match.movement.speed      0  -26.48461 -100.77639
## lag.oppo.height.off.net                       0  193.66613 -261.54646
## Probalities at zero values of the covariates.
## 1.965996e-76 1.157112e-88 1 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1       St2         St3
## (Intercept)                                   0  100.0226   -3.169156
## p.movement.angle.1                            0  121.2693   43.695277
## oppo.diff.avg.shot.and.match.movement.speed   0 -123.1687 -148.207859
## p.start.position.y                            0  208.0577   -0.364518
## oppo.start.position.x                         0 -167.3211  118.455456
## oppo.start.position.y                         0 -537.1149   34.196433
## p.diff.avg.shot.and.match.movement.speed      0 -531.2505 -156.204664
## lag.oppo.height.off.net                       0 1039.1724  163.620337
## Probalities at zero values of the covariates.
## 3.636931e-44 1 1.528932e-45 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + 
##     p.start.position.y + oppo.start.position.x + oppo.start.position.y + 
##     p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients: 
##                                             St1        St2         St3
## (Intercept)                                   0  -37.27347   0.4807947
## p.movement.angle.1                            0  -64.10489   4.0982909
## oppo.diff.avg.shot.and.match.movement.speed   0 -112.62945   2.2513114
## p.start.position.y                            0   68.60935   6.4941508
## oppo.start.position.x                         0  -17.75711   8.5837089
## oppo.start.position.y                         0  131.82439   1.7661252
## p.diff.avg.shot.and.match.movement.speed      0   19.09418 -27.9321593
## lag.oppo.height.off.net                       0   81.93589  -3.6438528
## Probalities at zero values of the covariates.
## 0.3820645 2.480138e-17 0.6179355 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.-1 Re1.0 Re1.1
## St1  0.000 0.685 0.315
## St2  0.355 0.645 0.000
## St3  0.000 1.000 0.000

plotting Data Set 3 Training Model 2 with 3 states

probs = posterior(ds3.fm2a.3s)



    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2', "State3"),  fill=1:3, bty='n')

###Training Model 3 Data Set 3 Commented out due to causing errors with knit.

Replicating some models from winter project

#t1m1
library(depmixS4)
library(dplyr)

t1.mod1 <- depmix(winner ~ 1, transition = ~ oppo.speed + ser1 + ser2 + start.x, data = fed16_only, nstates = 2, family=multinomial("identity"))
t1.fm1 <- fit(t1.mod1)
## iteration 0 logLik: -77.76366 
## iteration 5 logLik: -77.48424 
## iteration 10 logLik: -76.90073 
## iteration 15 logLik: -76.14037 
## iteration 20 logLik: -75.62631 
## iteration 25 logLik: -75.35881 
## iteration 30 logLik: -75.11051 
## iteration 35 logLik: -74.79877 
## iteration 40 logLik: -74.43574 
## iteration 45 logLik: -74.04317 
## iteration 50 logLik: -73.6559 
## iteration 55 logLik: -73.31353 
## iteration 60 logLik: -73.03826 
## iteration 65 logLik: -72.82036 
## iteration 70 logLik: -72.63312 
## iteration 75 logLik: -72.45446 
## iteration 80 logLik: -72.26937 
## iteration 85 logLik: -72.07499 
## iteration 90 logLik: -71.89061 
## iteration 95 logLik: -71.73901 
## iteration 100 logLik: -71.62493 
## iteration 105 logLik: -71.54425 
## iteration 110 logLik: -71.48966 
## iteration 115 logLik: -71.45382 
## iteration 120 logLik: -71.43074 
## iteration 125 logLik: -71.41607 
## iteration 130 logLik: -71.40677 
## iteration 135 logLik: -71.401 
## iteration 140 logLik: -71.3974 
## iteration 145 logLik: -71.39516 
## iteration 150 logLik: -71.39376 
## iteration 155 logLik: -71.39287 
## iteration 160 logLik: -71.39232 
## iteration 165 logLik: -71.39198 
## iteration 170 logLik: -71.39177 
## iteration 175 logLik: -71.39164 
## iteration 180 logLik: -71.39156 
## iteration 185 logLik: -71.3915 
## iteration 190 logLik: -71.39147 
## iteration 195 logLik: -71.39145 
## iteration 200 logLik: -71.39144 
## iteration 205 logLik: -71.39143 
## iteration 210 logLik: -71.39143 
## converged at iteration 214 with logLik: -71.39142
summary(t1.fm1)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x
## Coefficients: 
##             St1        St2
## (Intercept)   0  6.1234140
## oppo.speed    0 -0.0175171
## ser1          0 -2.0220807
## ser2          0  8.1454947
## start.x       0  0.2116409
## Probalities at zero values of the covariates.
## 0.002186173 0.9978138 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x
## Coefficients: 
##             St1         St2
## (Intercept)   0  1.25959779
## oppo.speed    0 -0.06784264
## ser1          0  0.54424531
## ser2          0  0.71535248
## start.x       0 -0.16521021
## Probalities at zero values of the covariates.
## 0.2210431 0.7789569 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1     0     1
## St2     1     0
library(depmixS4)
t2.mod1 <- depmix(list(winner ~ 1, speed.ratio ~ 1), transition = ~ oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x + lag.p.angle + lag.speed.ratio, data = fed16_only, nstates = 2, family=list(multinomial("identity"), gaussian()))
t2.fm1<- fit(t2.mod1)
## iteration 0 logLik: -143.5021 
## iteration 5 logLik: -126.2489 
## iteration 10 logLik: -123.843 
## iteration 15 logLik: -123.154 
## iteration 20 logLik: -122.5015 
## iteration 25 logLik: -121.0058 
## iteration 30 logLik: -120.641 
## iteration 35 logLik: -120.5495 
## iteration 40 logLik: -120.4756 
## iteration 45 logLik: -120.4366 
## iteration 50 logLik: -120.4199 
## iteration 55 logLik: -120.412 
## iteration 60 logLik: -120.4066 
## iteration 65 logLik: -120.4012 
## iteration 70 logLik: -120.3918 
## iteration 75 logLik: -120.3746 
## iteration 80 logLik: -120.3439 
## iteration 85 logLik: -120.2986 
## iteration 90 logLik: -120.2568 
## iteration 95 logLik: -120.234 
## iteration 100 logLik: -120.2254 
## iteration 105 logLik: -120.2221 
## iteration 110 logLik: -120.2206 
## iteration 115 logLik: -120.2195 
## iteration 120 logLik: -120.2187 
## iteration 125 logLik: -120.2182 
## iteration 130 logLik: -120.2179 
## converged at iteration 132 with logLik: -120.2179
summary(t2.fm1)
## Initial state probabilties model 
## pr1 pr2 
##   0   1 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x + 
##     lag.p.angle + lag.speed.ratio
## Coefficients: 
##                 St1        St2
## (Intercept)       0   5.097239
## oppo.speed        0  -2.769014
## ser1              0 -80.897804
## ser2              0  85.995043
## start.x           0  32.533365
## o.angle           0  13.544531
## oppo.hit.x        0  51.275271
## lag.p.angle       0  -4.068974
## lag.speed.ratio   0 -88.178116
## Probalities at zero values of the covariates.
## 0.006076456 0.9939235 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x + 
##     lag.p.angle + lag.speed.ratio
## Coefficients: 
##                 St1        St2
## (Intercept)       0 -8.3734301
## oppo.speed        0  0.3738778
## ser1              0 -5.5242473
## ser2              0 -2.8491828
## start.x           0  0.6478550
## o.angle           0 -0.1590219
## oppo.hit.x        0  0.5840919
## lag.p.angle       0  0.0375421
## lag.speed.ratio   0  2.1257951
## Probalities at zero values of the covariates.
## 0.9997691 0.0002308688 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
## Resp 2 : gaussian 
##     Re1.0 Re1.1 Re2.(Intercept) Re2.sd
## St1 0.763 0.237           1.262  0.548
## St2 0.877 0.123           0.902  0.219

Stepwise Analysis

Stepwise to find best variables for Depmix for Federer vs Berdych 2016 (Data Set 1, Training Model 1)

##Stepwise for first best variable
#run through all variables to find highest log lik
fcov1 <- lapply(fed16_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 160 with logLik: -69.6267 
## converged at iteration 490 with logLik: -75.29084 
## converged at iteration 204 with logLik: -71.53859 
## converged at iteration 300 with logLik: -74.05906 
## converged at iteration 378 with logLik: -70.67435 
## converged at iteration 274 with logLik: -73.30207 
## converged at iteration 257 with logLik: -73.03561 
## converged at iteration 230 with logLik: -70.70626
#Pulling covergence log likelihoods into a dataframe
fcov1_df <-  as.data.frame(
  c(logLik(fcov1$p.start.position.x), logLik(fcov1$p.start.position.y),
    logLik(fcov1$oppo.start.position.x), logLik(fcov1$oppo.start.position.y),
    logLik(fcov1$p.movement.angle.1),
    logLik(fcov1$p.diff.avg.shot.and.match.movement.speed),
    logLik(fcov1$oppo.diff.avg.shot.and.match.movement.speed),
    logLik(fcov1$diff.p.avg.and.current.shot.speed),
    logLik(fcov1$lag.oppo.height.off.net)
    )
  )

fcov1_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(fcov1_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov1_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik          Variables
## 1           -69.6267 p.start.position.x
###First best variable is p.start.position.x
#Running single variable model of p.start.position.x to look at coefficients

mod1 <- depmix(winner ~ 1, transition = ~ p.start.position.x, data = fed16_scale, nstates = 2, family=multinomial("identity"))
fm1 <- fit(mod1)
## iteration 0 logLik: -77.78931 
## iteration 5 logLik: -77.71084 
## iteration 10 logLik: -77.57877 
## iteration 15 logLik: -77.45259 
## iteration 20 logLik: -77.32209 
## iteration 25 logLik: -77.16141 
## iteration 30 logLik: -76.95374 
## iteration 35 logLik: -76.69281 
## iteration 40 logLik: -76.41159 
## iteration 45 logLik: -76.17172 
## iteration 50 logLik: -75.98834 
## iteration 55 logLik: -75.79927 
## iteration 60 logLik: -75.09728 
## iteration 65 logLik: -73.06424 
## iteration 70 logLik: -72.27577 
## iteration 75 logLik: -71.85849 
## iteration 80 logLik: -71.39553 
## iteration 85 logLik: -70.94939 
## iteration 90 logLik: -70.63778 
## iteration 95 logLik: -70.4384 
## iteration 100 logLik: -70.30736 
## iteration 105 logLik: -70.21648 
## iteration 110 logLik: -70.15506 
## iteration 115 logLik: -70.14649 
## iteration 120 logLik: -70.14242 
## iteration 125 logLik: -70.14049 
## iteration 130 logLik: -70.13959 
## iteration 135 logLik: -70.13917 
## iteration 140 logLik: -70.13897 
## iteration 145 logLik: -70.13888 
## iteration 150 logLik: -70.13887 
## converged at iteration 151 with logLik: -70.13887
summary(fm1)
## Initial state probabilties model 
## pr1 pr2 
##   1   0 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~p.start.position.x
## Coefficients: 
##                    St1       St2
## (Intercept)          0 -38.20915
## p.start.position.x   0 232.47106
## Probalities at zero values of the covariates.
## 1 2.546701e-17 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~p.start.position.x
## Coefficients: 
##                    St1        St2
## (Intercept)          0   4.469282
## p.start.position.x   0 -19.593038
## Probalities at zero values of the covariates.
## 0.01132579 0.9886742 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
##     Re1.0 Re1.1
## St1 0.593 0.407
## St2 0.981 0.019
#Graphing fitted model states for first best variable

probs = posterior(fm1)


    plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')

    matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
        legend(x='topright', c('State1','State2'),  fill=1:2, bty='n')

##Stepwise to find second best variable
#run through all variables to find highest log lik
fcov2 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 218 with logLik: -70.0557 
## converged at iteration 80 with logLik: -66.88238 
## converged at iteration 235 with logLik: -70.79542 
## converged at iteration 291 with logLik: -69.94087 
## converged at iteration 163 with logLik: -67.95979 
## converged at iteration 143 with logLik: -69.06322 
## converged at iteration 108 with logLik: -67.64185
#Pulling covergence log likelihoods into a dataframe
fcov2_df <-  as.data.frame(
  c(logLik(fcov2$p.start.position.y),
    logLik(fcov2$oppo.start.position.x), logLik(fcov2$oppo.start.position.y),
    logLik(fcov2$p.movement.angle.1),
    logLik(fcov2$oppo.diff.avg.shot.and.match.movement.speed),
    logLik(fcov2$diff.p.avg.and.current.shot.speed),
    logLik(fcov2$lag.oppo.height.off.net)
    )
  )

fcov2_df$newcolumn<-c( "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(fcov2_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov2_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik             Variables
## 1          -66.88238 oppo.start.position.x
##Stepwise to find the third best variable
#second best variable is lag.oppo.height.off.net
#run through all variables to find highest log lik
fcov3 <- lapply(fed16_scale[c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 191 with logLik: -66.33359 
## converged at iteration 181 with logLik: -66.32436 
## converged at iteration 175 with logLik: -66.84969 
## converged at iteration 260 with logLik: -64.27658 
## converged at iteration 158 with logLik: -62.66555 
## converged at iteration 161 with logLik: -65.57351
#Pulling covergence log likelihoods into a dataframe
fcov3_df <-  as.data.frame(
  c(logLik(fcov3$p.start.position.y),
    logLik(fcov3$oppo.start.position.x), logLik(fcov3$oppo.start.position.y),
    logLik(fcov3$p.movement.angle.1),
    logLik(fcov3$oppo.diff.avg.shot.and.match.movement.speed),
    logLik(fcov3$diff.p.avg.and.current.shot.speed)  )
  )

fcov3_df$newcolumn<-c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed")

names(fcov3_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov3_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik                                   Variables
## 1          -62.66555 oppo.diff.avg.shot.and.match.movement.speed
##Stepwise to find the fourth best variable
library(depmixS4)
#third best variable is oppo.diff.avg.shot.and.match.movement.speed


#run through all variables to find highest log lik
fcov4 <- lapply(fed16_scale[c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 134 with logLik: -62.01148 
## converged at iteration 141 with logLik: -61.90335 
## converged at iteration 107 with logLik: -62.56972 
## converged at iteration 253 with logLik: -62.3559 
## converged at iteration 94 with logLik: -62.08507
#Pulling covergence log likelihoods into a dataframe
fcov4_df <-  as.data.frame(
  c(logLik(fcov4$p.start.position.y),
    logLik(fcov4$oppo.start.position.x), logLik(fcov4$oppo.start.position.y),
    logLik(fcov4$p.movement.angle.1),
    logLik(fcov4$diff.p.avg.and.current.shot.speed)  )
  )

fcov4_df$newcolumn<-c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")

names(fcov4_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov4_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik             Variables
## 1          -61.90335 oppo.start.position.x
##Stepwise to find fifth best variable
#Fourth best variable is oppo.start.position.x


#run through all variables to find highest log lik
fcov5 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.y",  "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 109 with logLik: -61.46449 
## converged at iteration 172 with logLik: -61.89461 
## converged at iteration 79 with logLik: -60.87455 
## converged at iteration 142 with logLik: -60.51499
#Pulling covergence log likelihoods into a dataframe
fcov5_df <-  as.data.frame(
  c(logLik(fcov5$p.start.position.y), logLik(fcov5$oppo.start.position.y),
    logLik(fcov5$p.movement.angle.1),
    logLik(fcov5$diff.p.avg.and.current.shot.speed)  )
  )

fcov5_df$newcolumn<-c("p.start.position.y", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")

names(fcov5_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov5_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik                         Variables
## 1          -60.51499 diff.p.avg.and.current.shot.speed
##Stepwise to find sixth best variable
#fifth best variable is diff.p.avg.and.current.shot.speed
#run through all variables to find highest log lik
fcov6 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.y",  "p.movement.angle.1")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 110 with logLik: -61.08021 
## converged at iteration 111 with logLik: -60.53194
#Pulling covergence log likelihoods into a dataframe
fcov6_df <-  as.data.frame(
  c(logLik(fcov6$p.start.position.y), logLik(fcov6$oppo.start.position.y),
    logLik(fcov6$p.movement.angle.1)  )
  )

fcov6_df$newcolumn<-c("p.start.position.y", "oppo.start.position.y", "p.movement.angle.1")

names(fcov6_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
fcov6_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik          Variables
## 1          -54.00198 p.movement.angle.1
#Model stops improving last variable added to the model is p.movement.angle.1

Running stepwise for Federer vs Berdych 2016/17/18 (Data Set 2 Training Model 2)

#run through all variables to find highest log lik
#tm2.fcov1 is training model 2 fitted covariates 1
tm2.fcov1 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 415 with logLik: -187.7404
#Pulling covergence log likelihoods into a dataframe
tm2.fcov1_df <-  as.data.frame(
  c(logLik(tm2.fcov1$p.start.position.x), logLik(tm2.fcov1$p.start.position.y),
    logLik(tm2.fcov1$oppo.start.position.x), logLik(tm2.fcov1$oppo.start.position.y),
    logLik(tm2.fcov1$p.movement.angle.1),
    logLik(tm2.fcov1$p.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov1$oppo.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov1$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov1$lag.oppo.height.off.net)  )
  )

tm2.fcov1_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov1_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov1_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik          Variables
## 1          -181.9063 p.movement.angle.1
#first best variable p.movement.angle.1

#run through all variables to find highest log lik
tm2.fcov2 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 282 with logLik: -183.4989 
## converged at iteration 1 with logLik: -193.5096 
## converged at iteration 316 with logLik: -176.6738 
## converged at iteration 461 with logLik: -180.3282 
## converged at iteration 254 with logLik: -179.3088
#Pulling covergence log likelihoods into a dataframe
tm2.fcov2_df <-  as.data.frame(
  c(logLik(tm2.fcov2$p.start.position.x), logLik(tm2.fcov2$p.start.position.y),
    logLik(tm2.fcov2$oppo.start.position.x), logLik(tm2.fcov2$oppo.start.position.y),
    logLik(tm2.fcov2$p.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov2$oppo.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov2$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov2$lag.oppo.height.off.net)  )
  )

tm2.fcov2_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed",  "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov2_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov2_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik                                   Variables
## 1          -176.6738 oppo.diff.avg.shot.and.match.movement.speed
#second best variable is oppo.diff.avg.shot.and.match.movement.speed

#run through all variables to find highest log lik
tm2.fcov3 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 473 with logLik: -178.4031 
## converged at iteration 270 with logLik: -175.8243 
## converged at iteration 241 with logLik: -177.3179 
## converged at iteration 376 with logLik: -181.2354 
## converged at iteration 441 with logLik: -178.1429
#Pulling covergence log likelihoods into a dataframe
tm2.fcov3_df <-  as.data.frame(
  c(logLik(tm2.fcov3$p.start.position.x), logLik(tm2.fcov3$p.start.position.y),
    logLik(tm2.fcov3$oppo.start.position.x), logLik(tm2.fcov3$oppo.start.position.y),
    logLik(tm2.fcov3$p.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov3$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov3$lag.oppo.height.off.net)  )
  )

tm2.fcov3_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov3_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov3_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik          Variables
## 1          -175.8243 p.start.position.y
#third best variable is p.start.position.y
#run through all variables to find highest log lik
tm2.fcov4 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 205 with logLik: -173.8228 
## converged at iteration 335 with logLik: -174.9686 
## converged at iteration 301 with logLik: -175.5103 
## converged at iteration 312 with logLik: -175.0519
#Pulling covergence log likelihoods into a dataframe
tm2.fcov4_df <-  as.data.frame(
  c(logLik(tm2.fcov4$p.start.position.x),
    logLik(tm2.fcov4$oppo.start.position.x), logLik(tm2.fcov4$oppo.start.position.y),
    logLik(tm2.fcov4$p.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov4$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov4$lag.oppo.height.off.net)  )
  )

tm2.fcov4_df$newcolumn<-c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov4_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov4_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik             Variables
## 1          -173.8228 oppo.start.position.x
#fourth best variable is oppo.start.position.x
#run through all variables to find highest log lik
tm2.fcov5 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 255 with logLik: -178.546 
## converged at iteration 220 with logLik: -173.8217 
## converged at iteration 209 with logLik: -172.6809 
## converged at iteration 398 with logLik: -172.6645 
## converged at iteration 203 with logLik: -178.4566 
## converged at iteration 265 with logLik: -173.5052
#Pulling covergence log likelihoods into a dataframe
tm2.fcov5_df <-  as.data.frame(
  c(logLik(tm2.fcov5$p.start.position.x), logLik(tm2.fcov5$oppo.start.position.y),
    logLik(tm2.fcov5$p.diff.avg.shot.and.match.movement.speed),
    logLik(tm2.fcov5$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov5$lag.oppo.height.off.net)  )
  )

tm2.fcov5_df$newcolumn<-c( "p.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov5_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov5_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik                                Variables
## 1          -172.6645 p.diff.avg.shot.and.match.movement.speed
#model stops improving at this point
#5th best variable alternates between oppo.start.position.y and p.diff.avg.shot.and.match.movement.speed. Adding both.
#run through all variables to find highest log lik
tm2.fcov6 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 232 with logLik: -178.3589 
## converged at iteration 344 with logLik: -173.2838 
## converged at iteration 205 with logLik: -177.9219 
## converged at iteration 208 with logLik: -168.8888
#Pulling covergence log likelihoods into a dataframe
tm2.fcov6_df <-  as.data.frame(
  c(logLik(tm2.fcov6$p.start.position.x),
    logLik(tm2.fcov6$diff.p.avg.and.current.shot.speed),
    logLik(tm2.fcov6$lag.oppo.height.off.net)  )
  )

tm2.fcov6_df$newcolumn<-c( "p.start.position.x", "diff.p.avg.and.current.shot.speed",  "lag.oppo.height.off.net")

names(tm2.fcov6_df) <- c("convergence.loglik", "Variables")

#print variable with the highest convergence log likelihood
tm2.fcov6_df %>%
   slice(which.max(convergence.loglik))
##   convergence.loglik               Variables
## 1          -168.8888 lag.oppo.height.off.net
#The model stops improving after this point. Last variable added to the model is lag.oppo.height.off.net

Running Stepwise for Data Set 3 (Strycova vs Garcia 2016/2017)

commented out to due to causing errors with knit.

Visualisation

Exploring visualisations of player movement and covariates

Below are:

-some of the visualisations I created when figuring out how to create the animation.

-Analysis of how the covariates and response variables move over time

-Visualisation in the differences between player positions and movement speeds.

Static player position paths

library(grid)

# ggplot(fed16_longest, aes(x=p.start.position.x, xend=p.end.position.x, y=p.start.position.y, yend=p.end.position.y, z=winner))   + stat_summary_hex(fun = function(winner) sum(winner)) + geom_segment(arrow = arrow(angle = 15,))  + xlab("Federer Position X ") + ylab("Federer Position Y") + ggtitle("Federer positions in longest rally") 

 # ggplot(fed16_longest, aes(x=p.start.position.x, xend=p.end.position.x, y=p.start.position.y, yend=p.end.position.y, z=winner))   + 
 #   stat_summary_hex(fun = function(winner) sum(winner)) + 
 #   geom_text(aes(label=shot)) +
 #   geom_path(arrow = arrow(angle = 15))  + 
 #   xlab("Federer Position X ") + 
 #   ylab("Federer Position Y") + 
 #   ggtitle("Federer positions in longest rally") 
#static player positions
trial_df <- filter(fed16_pos, final.shot >= 14)

 p.rally.paths <- ggplot(trial_df, aes(x=p.start.position.x,  y=p.start.position.y))    + 
   scale_x_continuous(breaks = axis_labels$x.break) +
    scale_y_continuous(breaks = axis_labels$y.break) +
    geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
    geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
    coord_fixed() +
   geom_point(aes(alpha = shot))  + 
   geom_path(  arrow = arrow(angle = 15)) + 
   facet_wrap(~rally.number) + 
   xlab("Federer Position X ") + 
   ylab("Federer Position Y") + 
   ggtitle("Federer start and end positions for rallies 14 shots or longer") 
 p.rally.paths <- ggplotly(p.rally.paths)
   
 p.rally.paths
oppo.rally.paths <- ggplot(trial_df, aes(x=oppo.start.position.x,  y=oppo.start.position.y ))    + 
  scale_x_continuous(breaks = axis_labels$x.break) +
  scale_y_continuous(breaks = axis_labels$y.break) +
  geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
  geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
  coord_fixed() +
  geom_point(aes(alpha = shot))  + 
  geom_path(  arrow = arrow(angle = 15)) +   
  facet_wrap(~rally.number) + xlab("Opponent Position X ") + ylab("Opponent Position Y") +
  ggtitle("Opponent start and end positions for for rallies 14 shots or longer")  
oppo.rally.paths <- ggplotly(oppo.rally.paths)
oppo.rally.paths

Movement of covariates and response variables over time

library(dplyr)
library(tidyr)
stry_plot1<- stry_scale %>%
  gather(key = vars, value = measurement, p.start.position.x, oppo.start.position.x, p.diff.avg.shot.and.match.movement.speed)

stry_plot2<- stry_scale %>%
  gather(key = vars, value = measurement, p.start.position.y, lag.oppo.height.off.net)

stry_plot3<- stry_scale %>%
  gather(key = vars, value = measurement, oppo.start.position.y, oppo.diff.avg.shot.and.match.movement.speed, diff.p.avg.and.current.shot.speed )

stry_plot4<- stry_scale %>%
  gather(key = vars, value = measurement, winner.return.error, p.advantage)

ggplot(stry_plot1, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")

ggplot(stry_plot2, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")

ggplot(stry_plot3, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")

ggplot(stry_plot4, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")

####Player/Opponent position & speed differentials

#player baseline/centre distance differentials


ggplot(longral_pos, aes(x=shot,y=p.start.position.base.diff)) +
  geom_point() +
  geom_path() + 
  geom_hline(yintercept = 0, linetype = "dotdash") + 
  facet_wrap(~rally.number) + 
  xlab("Shot Number")  + 
  ylab("Difference in distance from baseline Federer Vs Opponent") + 
  ggtitle("Federer vs Opponent difference in baseline distance in rallies 10 shots or more")

ggplot() + 
  geom_path(data=longral_pos, aes(x = shot, y = p.start.position.y, col="blue")) +
  geom_path(data=longral_pos, aes(x = shot, y = oppo.start.position.y, col="red")) +
  geom_hline(yintercept = 0, linetype = "dotdash") + 
  facet_wrap(~rally.number) + 
  xlab("Shot Number")  + 
  ylab("Distance from centre for Federer and Opponent") + 
  ggtitle("Federer vs Opponent distance from centre in rallies 10 shots or more") +
  scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))

ggplot() + 
  geom_path(data=longral_pos, aes(x = shot, y = p.start.position.base.dist, 
                                  yend =p.end.position.base.dist, col="blue")) +
  geom_path(data=longral_pos, aes(x = shot, y = oppo.start.position.base.dist,
                                  yend =oppo.end.position.base.dist,col="red")) +
  geom_hline(yintercept = 0, linetype = "dotdash") + 
  facet_wrap(~rally.number) + 
  xlab("Shot Number")  + 
  ylab("Distance from baseline for Federer and Opponent") + 
  ggtitle("Federer vs Opponent distance from baseline in rallies 10 shots or more") +
  scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))
## Warning: Ignoring unknown aesthetics: yend

## Warning: Ignoring unknown aesthetics: yend

#speed/acceleration differentials
ggplot() + 
  geom_path(data=longral_df, aes(x = shot, y = p.avg.speed, col="blue")) +
  geom_path(data=longral_df, aes(x = shot, y = oppo.avg.speed, col="red")) + 
  facet_wrap(~rally.number) + 
  xlab("Shot Number")  + 
  ylab("Avg Movement Speed m/s  Federer and Opponent") + 
  ggtitle("Federer vs Opponent Avg Movement Speed in Rallies 10 shots or more") +
  scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))

ggplot(longral_pos, aes(x=shot,y=avg.player.speed.diff)) +
  geom_path() + 
  geom_hline(yintercept = 0, linetype = "dotdash") + 
  facet_wrap(~rally.number) + 
  xlab("Shot Number")  + 
  ylab("Difference Avg Movement Speed m/s  Federer vs Opponent") + 
  ggtitle("Federer vs Opponent difference in Avg Movement Speed m/s  Federer and Opponent in rallies 10 shots or more")

Additional animations

We can also facet the animation by rallies. Code commented out to reduce size of html.